Search CORE

UCL Discovery

MPG.PuRe

Simple integrative preprocessing preserves what is shared in data sources

Author: Abhishek Tripathi
AP Gasch
Arto Klami
G Dennis
GR Lanckriet
H Hotelling
HC Causton
J Kettenring
J Nikkilä
JA Berger
JDR Farquhar
M Girolami
ME Ross
PT Spellman
Samuel Kaski
Y Yamanishi
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Bioinformatics data analysis toolbox needs general-purpose, fast and easily interpretable preprocessing tools that perform data integration during exploratory data analysis. Our focus is on vector-valued data sources, each consisting of measurements of the same entity but on different variables, and on tasks where source-specific variation is considered noisy or not interesting. Principal components analysis of all sources combined together is an obvious choice if it is not important to distinguish between data source-specific and shared variation. Canonical Correlation Analysis (CCA) focuses on mutual dependencies and discards source-specific "noise" but it produces a separate set of components for each source. Results It turns out that components given by CCA can be combined easily to produce a linear and hence fast and easily interpretable feature extraction method. The method fuses together several sources, such that the properties they share are preserved. Source-specific variation is discarded as uninteresting. We give the details and implement them in a software tool. The method is demonstrated on gene expression measurements in three case studies: classification of cell cycle regulated genes in yeast, identification of differentially expressed genes in leukemia, and defining stress response in yeast. The software package is available at <url>http://www.cis.hut.fi/projects/mi/software/drCCA/</url>. Conclusion We introduced a method for the task of data fusion for exploratory data analysis, when statistical dependencies between the sources and not within a source are interesting. The method uses canonical correlation analysis in a new way for dimensionality reduction, and inherits its good properties of being simple, fast, and easily interpretable as a linear projection.</p

Springer - Publisher Connector

Classifier evaluation and attribute selection against active adversaries

Author: Bowei Xi
CH Teo
Chris Clifton
CP Robert
D Mitra
GR Lanckriet
K Fukunaga
MJ Osborne
Murat Kantarcıoğlu
N Cesa-Bianchi
R Duda
T Basar
T Fawcett
T Vallee
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Automatic Diagnosis of Pathological Myopia from Heterogeneous Biomedical Data

Author: A Iwase
A Rakotomamonjy
AM Solouki
AW Foong
C Cortes
Chee Keong Kwoh
CS Carlson
D Lowe
D Stambolian
Damon Wing Kee Wong
Dana C. Crawford
GR Lanckriet
H Buch
HG Krumpaszky
J Liu
J Wigginton
Jiang Liu
JS Green
K Mikolajczyk
Q Fan
RE Fan
Seang-Mei Saw
SM Saw
Tien Yin Wong
TIH Consortium
TL Young
Yanwu Xu
YF Shih
YL Li
Z Li
Zhuo Zhang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

10.1371/journal.pone.0065736PLoS ONE86

DR-NTU (Digital Repository of NTU)

ScholarBank@NUS

Predicting gene function using hierarchical multi-label decision tree ensembles

Author: A Clare
A Clare
A Clare
B Hayete
C Vens
Celine Vens
D Kocev
Dragi Kocev
E Zdobnov
F Provost
F Wilcoxon
G Obozinski
GR Lanckriet
H Blockeel
H Blockeel
H Blockeel
H Chua
H Drucker
H Lee
H Mewes
Hendrik Blockeel
J Davis
J Gough
J Quinlan
J Rousu
J Struyf
Jan Struyf
L Breiman
L Breiman
L Breiman
L Breiman
L Pena-Castillo
Leander Schietgat
M Ashburner
M Deng
M Ouali
N Cesa-Bianchi
O Troyanskaya
R Caruana
S Altschul
S Mostafavi
Sašo Džeroski
T Hughes
T Joachims
U Karaoz
W Kim
W Tian
Y Chen
Y Guan
Z Barutcuoglu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background <it>S. cerevisiae</it>, <it>A. thaliana </it>and <it>M. musculus </it>are well-studied organisms in biology and the sequencing of their genomes was completed many years ago. It is still a challenge, however, to develop methods that assign biological functions to the ORFs in these genomes automatically. Different machine learning methods have been proposed to this end, but it remains unclear which method is to be preferred in terms of predictive performance, efficiency and usability. Results We study the use of decision tree based models for predicting the multiple functions of ORFs. First, we describe an algorithm for learning hierarchical multi-label decision trees. These can simultaneously predict all the functions of an ORF, while respecting a given hierarchy of gene functions (such as FunCat or GO). We present new results obtained with this algorithm, showing that the trees found by it exhibit clearly better predictive performance than the trees found by previously described methods. Nevertheless, the predictive performance of individual trees is lower than that of some recently proposed statistical learning methods. We show that ensembles of such trees are more accurate than single trees and are competitive with state-of-the-art statistical learning and functional linkage methods. Moreover, the ensemble method is computationally efficient and easy to use. Conclusions Our results suggest that decision tree based methods are a state-of-the-art, efficient and easy-to-use approach to ORF function prediction.</p

Springer - Publisher Connector

Leiden University Scholary Publications

Directing Experimental Biology: A Case Study in Mitochondrial Biogenesis

Author: A Goffeau
A Jaimovich
A Sickmann
AB Owen
AH Tong
Amy A. Caudy
Andrey Rzhetsky
AV Kochetov
BJ Blencowe
Burke
C Andreoli
C Huttenhower
Chad L. Myers
CL Myers
CL Myers
CL Myers
Curtis Huttenhower
David C. Hess
DC Hess
E Nabieva
F Foury
F Perocchi
G Giaever
GR Lanckriet
H Kitano
H Koutnikova
H Prokisch
I Boldogh
I Lee
IR Boldogh
JB Moseley
JM Cherry
Kai Li
L Peña-Castillo
LM Steinmetz
M Ashburner
M Babcock
M Grunstein
M Ogur
M Ogur
MA Hibbs
Matthew A. Hibbs
OG Troyanskaya
Olga G. Troyanskaya
P Pavlidis
R Jansen
S DiMauro
TR Hughes
Z Barutcuoglu
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Computational approaches have promised to organize collections of functional genomics data into testable predictions of gene and protein involvement in biological processes and pathways. However, few such predictions have been experimentally validated on a large scale, leaving many bioinformatic methods unproven and underutilized in the biology community. Further, it remains unclear what biological concerns should be taken into account when using computational methods to drive real-world experimental efforts. To investigate these concerns and to establish the utility of computational predictions of gene function, we experimentally tested hundreds of predictions generated from an ensemble of three complementary methods for the process of mitochondrial organization and biogenesis in Saccharomyces cerevisiae. The biological data with respect to the mitochondria are presented in a companion manuscript published in PLoS Genetics (doi:10.1371/journal.pgen.1000407). Here we analyze and explore the results of this study that are broadly applicable for computationalists applying gene function prediction techniques, including a new experimental comparison with 48 genes representing the genomic background. Our study leads to several conclusions that are important to consider when driving laboratory investigations using computational prediction approaches. While most genes in yeast are already known to participate in at least one biological process, we confirm that genes with known functions can still be strong candidates for annotation of additional gene functions. We find that different analysis techniques and different underlying data can both greatly affect the types of functional predictions produced by computational methods. This diversity allows an ensemble of techniques to substantially broaden the biological scope and breadth of predictions. We also find that performing prediction and validation steps iteratively allows us to more completely characterize a biological area of interest. While this study focused on a specific functional area in yeast, many of these observations may be useful in the contexts of other processes and organisms

The Jackson Laboratory: The Mouseion at the JAXlibrary

Bayesian Markov Random Field Analysis for Protein Function Prediction Based on Network Data

Author: A Kuzniar
A Vazquez
Aalt D. J. van Dijk
AJ Enright
C Moler
Cajo J. F. ter Braak
CJF Ter Braak
CJF Ter Braak
CM Federovitch
DJC MacKay
GD Bader
GR Lanckriet
H Lee
I Kosmidis
I Ulitsky
Iddo Friedberg
IM Cheeseman
J Besag
JA Hanley
L Milligan
L Peña Castillo
M Ashburner
M Deng
M Deng
M Punta
Marco C. A. M. Bink
N Nariai
NJ Mulder
P McCullagh
R Sharan
RI Kondor
Roeland C. H. J. van Ham
S Ferré
S Geman
S Letovsky
S Mostafavi
SF Altschul
SR Collins
SZ Li
T Gabaldon
U Karaoz
V Vethantham
XL Chen
Y Chen
Y Guan
Yiannis A. I. Kourmpetis
Z Barutcuoglu
Z Wei
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Inference of protein functions is one of the most important aims of modern biology. To fully exploit the large volumes of genomic data typically produced in modern-day genomic experiments, automated computational methods for protein function prediction are urgently needed. Established methods use sequence or structure similarity to infer functions but those types of data do not suffice to determine the biological context in which proteins act. Current high-throughput biological experiments produce large amounts of data on the interactions between proteins. Such data can be used to infer interaction networks and to predict the biological process that the protein is involved in. Here, we develop a probabilistic approach for protein function prediction using network data, such as protein-protein interaction measurements. We take a Bayesian approach to an existing Markov Random Field method by performing simultaneous estimation of the model parameters and prediction of protein functions. We use an adaptive Markov Chain Monte Carlo algorithm that leads to more accurate parameter estimates and consequently to improved prediction performance compared to the standard Markov Random Fields method. We tested our method using a high quality S.cereviciae validation network with 1622 proteins against 90 Gene Ontology terms of different levels of abstraction. Compared to three other protein function prediction methods, our approach shows very good prediction performance. Our method can be directly applied to protein-protein interaction or coexpression networks, but also can be extended to use multiple data sources. We apply our method to physical protein interaction data from S. cerevisiae and provide novel predictions, using 340 Gene Ontology terms, for 1170 unannotated proteins and we evaluate the predictions using the available literature

Wageningen University & Research Publications

Sparse kernel canonical correlation analysis for discovery of nonlinear interactions in high-dimensional data

Author: D Lin
DM Witten
E Parkhomenko
FR Bach
FR Bach
GR Lanckriet
H Hotelling
I González
I González
I Wilms
JP Vert
Junichiro Yoshimoto
Kenji Doya
Kosuke Yoshida
M Kloft
M Yamada
M Yuan
P Martin
P Ravikumar
S Akaho
S Waaijenborg
Y Yamanishi
Y Yamanishi
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

University of Liverpool Repository

Consensus-Phenotype Integration of Transcriptomic and Metabolomic Data Implies a Role for Metabolism in the Chemosensitivity of Tumour Cells

Using transcriptomic and metabolomic measurements from the NCI60 cell line panel, together with a novel approach to integration of molecular profile data, we show that the biochemical pathways associated with tumour cell chemosensitivity to platinum-based drugs are highly coincident, i.e. they describe a consensus phenotype. Direct integration of metabolome and transcriptome data at the point of pathway analysis improved the detection of consensus pathways by 76%, and revealed associations between platinum sensitivity and several metabolic pathways that were not visible from transcriptome analysis alone. These pathways included the TCA cycle and pyruvate metabolism, lipoprotein uptake and nucleotide synthesis by both salvage and de novo pathways. Extending the approach across a wide panel of chemotherapeutics, we confirmed the specificity of the metabolic pathway associations to platinum sensitivity. We conclude that metabolic phenotyping could play a role in predicting response to platinum chemotherapy and that consensus-phenotype integration of molecular profiling data is a powerful and versatile tool for both biomarker discovery and for exploring the complex relationships between biological pathways and drug response

Maastricht University Research Portal

Spiral - Imperial College Digital Repository

MPG.PuRe

Biomedical Discovery Acceleration, with Applications to Craniofacial Development

Author: A Amano
A Baumeister
A Cvekl
A Ferrer-Martinez
A Gabow
A Gavalas
A Hollnagel
A Jaimovich
A Karimpour-Fard
A Karimpour-Fard
A Karimpour-Fard
A Karimpour-Fard
A L'Honore
A Nakaya
A Nazarali
A Subramanian
A Visel
A Yamane
A Zanzoni
AK Ramani
AM Edwards
AY Sivachenko
B Kanzler
BJ Daigle Jr
BT Alako
C Faloutsos
C North
C von Mering
CH Yeang
CL Myers
CL Myers
CM Deane
D Barker
D Eisenberg
D Hanisch
D Hwang
DJ Reiss
DP Hill
DP Tan
DR Rhodes
DS Goldberg
E Nabieva
E Segal
E Sprinzak
E Wingender
EM Marcotte
F Cozman
F Sohler
FM Rijli
GD Bader
GD Bader
GR Lanckriet
H Hishigaki
H Ogata
H Suzuki
H Tipney
Hannah Tipney
HJ Drabkin
HY Chuang
I Iossifov
I Lee
I Xenarios
J Chen
J Cui
J Graw
J Kim
J Kim
J Li
J Sun
JP Vert
JR Barrow
JS Bader
JT Eppig
L Hedges
L Hunter
L Hunter
L Li
L Salwinski
Lawrence Hunter
M Ashburner
M Bada
M Donalies
M Downes
M Downes
M Gendron-Maguire
M Kanai-Azuma
M Kanehisa
M Krallinger
M Maconochie
MC Mikl
MP Smidt
MS Scott
MY Galperin
N Daraselia
N Nariai
OG Troyanskaya
P Dupont
P Hunt
P Lipton
P Pei
P Saraiya
P Shannon
PA Gray
PM Bowers
Priyanka Kasliwal
PW Lord
R Bellazzi
R Hoffman
R Jansen
R Saito
Richard A. Spritz
Ronald P. Schuyler
S Asthana
S Brewer
S Draghici
S Imoto
S Kerrien
S Leach
S Leach
Satoru Miyano
Sonia M. Leach
T Ideker
T Matsumoto
T Schlitt
Trevor Williams
V Ferretti
W Feng
W Feng
WA Baumgartner
WA Baumgartner Jr
Weiguo Feng
William A. Baumgartner
X Yang
Y Chen
Y Kamei
Y Nakayama
Y Yamanishi
Y Yamanishi
Publication venue: Public Library of Science
Publication date: 01/03/2009
Field of study

The profusion of high-throughput instruments and the explosion of new results in the scientific literature, particularly in molecular biomedicine, is both a blessing and a curse to the bench researcher. Even knowledgeable and experienced scientists can benefit from computational tools that help navigate this vast and rapidly evolving terrain. In this paper, we describe a novel computational approach to this challenge, a knowledge-based system that combines reading, reasoning, and reporting methods to facilitate analysis of experimental data. Reading methods extract information from external resources, either by parsing structured data or using biomedical language processing to extract information from unstructured data, and track knowledge provenance. Reasoning methods enrich the knowledge that results from reading by, for example, noting two genes that are annotated to the same ontology term or database entry. Reasoning is also used to combine all sources into a knowledge network that represents the integration of all sorts of relationships between a pair of genes, and to calculate a combined reliability score. Reporting methods combine the knowledge network with a congruent network constructed from experimental data and visualize the combined network in a tool that facilitates the knowledge-based analysis of that data. An implementation of this approach, called the Hanalyzer, is demonstrated on a large-scale gene expression array dataset relevant to craniofacial development. The use of the tool was critical in the creation of hypotheses regarding the roles of four genes never previously characterized as involved in craniofacial development; each of these hypotheses was validated by further experimental work